RNA denaturation: excluded volume, pseudoknots and transition scenarios 
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A lattice model of RNA denaturation which fully accounts for the excluded volume effects among 
nucleotides is proposed. A numerical study shows that interactions forming pseudoknots must be 
included in order to get a sharp continuous transition. Otherwise a smooth crossover occurs from the 
swollen linear polymer behavior to highly ramified, almost compact conformations with secondary 
structures. In the latter scenario, which is appropriate when these structures are much more stable 
than pseudoknot links, probability distributions for the lengths of both loops and main branches 
obey scaling with nonclassical exponents. 

PACS numbers: 87.15.Aa, 87.14.Gg, 05.70.Fh, 64.60.Fr 
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In recent years, considerable attention has been de- 
voted to the problem of describing the formation of sec- 
ondary structure (base pairingrnap) in single molecular 
strands of RNA mi^|s&U,|^uiuLl3- The solution of such 
a problem is regarded as an important step within the 
general program of understanding how structure is en- 
coded in the primary sequence of biopolymers. By mak- 
ing use of some simplifications, like that of disregarding 
excluded volume effects or pseudoknots formation, some 
studies established the existence of a molten phase at rel- 
atively high temperatures for an RNA molecule in dilute 
solution H 0, El- hi this phase the inhomogeneities 
associated to a specific primary sequence should be irrel- 
evant for the large scale behavior and should allow the 
coexistence of a very large number of different secondary 
structures of comparable free energy. 

As the temperature T increases, a long RNA molecule 
should pass from the molten phase to a regime in which 
secondary structures essentially disappear and the global 
behavior becomes that of a linear polymer chain in good 
solvent. Excluded volume should play a relevant role at 
such a denaturation transition. Indeed, there the en- 
tropic free energy gain associated to the formation of 
hairpins or of more complicated branched structures with 
loops is comparable with the corresponding base pair 
binding and staking energies, and depends crucially on 
the repulsive interactions. Recent studies have shown 
that the discontinuous nature and the universal features 
of double stranded DNA denaturation are determined by 
excluded volume interactions [13, IIS Il4| . 

To our knowledge, starting with the related pioneering 
work of de Gennes 15] on the statistics of branchings 
and hairpin helices in the periodic dAT copolymer, ex- 
cluded volume effects were never fully taken into account 
in studies of RNA denaturation. This leaves open the 
problem of establishing the existence and of determining 
the possible character of this transition in the long chain 
limit. A realistic embedding of the system in space, tak- 
ing into account excluded volume, is also a necessary con- 
dition for discussing pseudoknots and their consequences. 



Pseudoknots occur, e.g., when two loops locally bind to 
each other determining a deviation of the configuration 
from planar topology |8j . Normally they are not included 
in models of the secondary structure 0, 0, IE EJ E3 , or 
are considered as a perturbation [TtI ]. 

In this Letter we propose a model of the large scale con- 
formational behavior of RNA in the high T and molten 
phases. Although schematic, our model takes fully into 
account excluded volume and allows control of the effects 
of pseudoknots. While providing useful informations on 
the behavior of finite RNA chains, an extensive numerical 
analysis allows to draw precise scenarios for denaturation 
and the associated scaling regimes in different conditions. 

At coarse-grained level we model a conformation of 
the RNA strand as a two-tolerant trail of N steps on the 
face centered cubic (FCC) lattice [2f|. This is a random 
walk in which no more than two steps are allowed to 
overlap on a single lattice bond, forming what we call a 
contact. This restriction takes into account the excluded 
volume. In addition, by giving an orientation to the trail, 
we impose that only pairs of antiparallel steps can form 
contacts, and whenever this happens a gain in energy 
e < is counted. The orientation of the trail reflects 
the backbone directionality or RNA In our model a 
contact corresponds to a sequence of bound base pairs 
over a distance of the order of the persistence length of 
the RNA double helix. This persistence length can vary 
with T near a denaturation transition. However, this 
effect should not matter for the large scale properties. 
Since our model is coarse-grained and we are not inter- 
ested in T's below those of the molten phase, we neglect 
also the heterogeneity of base pair interactions. Thus, e 
represents an effective, average parameter. 

Figure Q reports schematically two possible configura- 
tions of our model. In both (a) and (b) a diagram on 
the right summarizes the corresponding contact map. A 
bridge in the diagrams connects each pair of steps form- 
ing a contact. Bridges are numbered in order of appear- 
ance if one follows the trail orientation. A main bridge 
is a bridge which is not inscribed within other, larger 
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FIG. 1: RNA configurations and corresponding contact maps. Overlapped steps (contacts) are slightly split. In (a) a 
pseudoknot is present (crossing of bridge "1" with bridge "2"). In (b) a loop of length i = 5 is marked by a thicker line, both 
on the chain and in the contact map. Here, "1" and "4" are main bridges. 



bridges. Unlike (b), (a) shows a pseudoknot, indicated 
by the crossing of two bridges in the diagram. This cross- 
ing means that a step forming a loop overlaps with one 
outside the loop. In order to investigate the role of pseu- 
doknots, we consider two variants of the model, which 
we refer to as I and II. While in model I configurations 
with pseudoknots are allowed, in addition to those with- 
out pseudoknots, in model II the former are forbidden 
altogether. The choice of attributing the same energy 
to all kinds of contacts is a simplification of model I. 
Computationally it would be awkward to attribute selec- 
tively a weaker binding energy to those contacts which 
form pseudoknots, as physically appropriate in most sit- 
uations 

Thermodynamic quantities and canonical averages 
are defined in terms of the partition function Z = 
J2 W ex P(~ H (w)/T). The sum extends to all allowed con- 
figurations w with \w\ — N steps, and H = eN c (w), 
N c (w) being the number of contacts in w. For both 
models, we sampled configurations by a multiple Markov 
chain Monte Carlo procedure using several (w 20) 
temperatures satisfying 0<e/T<3.5 |27|. 

We first computed as a function of T the specific heat 
of model I and II for different N. At a continuous confor- 
mational transition with crossover exponent <p < \ one 
expects a singular behavior C ma .v ~ jV 2 ^ -1 for the maxi- 
mum of this quantity as N — > oo |19i |20j . In case (j> < 1/2 
such singularity does not imply a divergence. For both 
models we find no evidence of a diverging C max . Hence, 
at this level we can only conclude that for both models 
the denaturation transition must be continuous and with 
4> < 1/2, if it exists. 

We also determined two geometrical radii of the con- 
figurations, namely the end-to-end distance, i? e , and the 
radius of gyration with respect to the center of mass, 



plots of (Re)/{Rg) for different N. For model I the trend 
of the curves gives indication of a sharp transition at 
e/T w 1.9. Indeed, for high T the ratio approaches from 
below the universal value 6.25(1) appropriate for a poly- 
mer in the swollen, self avoiding walk (SAW) regime [22|. 
On the other hand, at very low T's the trail should fold 
in double structures with maximal number of contacts 
(N c ~ N/2), in which R e necessarily approaches zero. 
This explains the trend towards zero (from above) of 
the curves at low T. Remarkable is the accumulation 
of intersection points for e/T w 1.9. These intersections 
mark a change of the trend of the curves for increasing N 
and suggest the presence of a peculiar transition regime 
with universal ratio ~ 4.8. Hence, for model I there is 
clear evidence of a second order transition at e/T w 1.9. 
For model II there is no similar indication: the inter- 
sections are pushed towards lower and lower T's as one 
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R g . Multicritical phenomena theory [19j,|21| has taught 
us that the ratios of the averages of such radii in the 
N — > oo limit are universal numbers characteristic of the 
different regimes involved in the transition. Fig. [21 shows 



e/T 



FIG. 2: (R e ) 2 /(R g ) 2 as a function of e/T for three different 
values of N. Inset: detail of the crossings of four curves for 
model II. The circles enclose intersections between the curve 
pairs (300, 400), (400, 600) and (600, 800). 
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compares curves corresponding to pairs of increasing TV 
values (Fig. |3 inset). This means that the larger N, the 
deeper the SAW regime extends in the low T region. The 
whole pattern suggests for model II a smooth crossover, 
not a transition. Further insight is provided by the study 
of some scaling properties. The radius of gyra tion is ex- 
pected to scale as (R g ) ~ N v for large N 19] . For both 
models we observe that at high T the determinations of 
v at finite N, for N — > oo approach a value ks 0.59 appro- 
priate for a SAW in d = 3 jH . For model I, at T's 
sufficiently below the transition the v estimates can be 
extrapolated to w 0.35 for large N. This indicates that 
the configurations are very close to compact in the low 
T, molten phase (y w 1/d = 1/3). For model II at very 
low T's we extrapolate v sa 0.4 which is also not far from 
v = 1/2, as expected for branched polymers 0, |2s| . 

The different behaviors of the two models are due 
to the presence of pseudoknots in model I. Indeed, the 
fraction of sampled configurations with pseudoknots in 
model I is already substantial and increases with N at 
high T. It reaches soon values close to 1 near the tran- 
sition and below. Pseudoknots correspond to the for- 
mation of extra binding contacts and thus can lead to 
more compact configurations with respect to the case of 
model II. These extra contacts trigger the sharp tran- 
sition observed at e/T w 1.9. For model II, if present, 
a transition should be located at much lower T's, most 
likely below the range of applicability of the model [l^j . 

The possibility of forming pseudoknots is a driving fac- 
tor in tertiary structure formation 4]. However, model I 
somehow overamplifies this factor, because it gives pseu- 
doknot forming contacts an energy equal to that of the 
other contacts. In fact contacts forming pseudoknots 
should correspond most often to the weak binding of 
quite short portions of single strand loops, like for kiss- 
ing hairpins. Longer bindings giving rise to pseudoknots 
are expected to be kinetically inhibited []|. One way to 
make the energies of contacts forming pseudoknots closer 
to those of the other contacts is to introduce sufficiently 
high concentrations of divalent metal ions, like Mg 2+ in 
solution pi |23|. On the other hand, in model II pseu- 
doknot forming contacts would appear if one would look 
at the details of the configurations at somewhat more 
coarse-grained level. This means that this model can be 
interpreted as one similar to model I, but giving essen- 
tially zero energy to such contacts. Thus, it is reasonable 
to expect the description of model II to be most appropri- 
ate for not too low T's, and expecially when, e.g., a low 
concentration of Mg 2+ ions in solution enlarges the sta- 
bility gap between secondary structure and pseudoknot 
forming links (lE^. 

RNA denaturation corresponds to a substantial sup- 
pression with increasing T of the highly ramified struc- 
ture of loops and branches characterizing the molten 
phase. The analysis of the loops is feasible and particu- 
larly instructive in model II. For DNA, the distribution of 
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FIG. 3: Log- log plots for Pi 00 p(£) (thick lines, shifted down 
by 0.5 for clarity) and P re t(n) (thin lines, red online), both 
for N = 800 and for different T values. The dot-dashed lines 
have slope —1.1. 



the lengths of denaturated loops, corresponding to open- 
ings of the double helix, follows a power law whose expo- 
nent c determines the character of the transition [l3lll4]|. 
A simple example of loop in RNA is given by the closure 
of an isolated hairpin. In this case the loop is connected 
to the rest of the structure by a single branch of double 
steps in model II. Of course, more complicated situations 
may occur (thick loop in Fig. ^b)). Even at high T an 
extensive number of minor spike-like branches is present 
along the RNA backbone. Thus, loops with a fixed num- 
ber of branches naturally have length distributions with 
rather short cut-off. Therefore, we decided to sample 
the length of all loops identified in the various configura- 
tions, irrespective of the number of outgoing double step 
branches. The various lengths can be obtained from the 
contact map of each configuration, by using a recursive 
algorithm that identifies all the loops inside each main 
bridge in the diagram. Another interesting quantity is 
the return time, i.e. the total number of steps comprised 
within a main bridge. In the assumed planar topology of 
model II this time is the total arc length corresponding to 
each departure of the configuration from a contact-free, 
linear polymer behavior. Probability distributions of the 
loop lengths, t, and of the return times, n, are plotted 
in Fig. for different T's and for N = 800. At high T, 
after transients both distributions behave as power laws 
with approximately identical exponents: Pi 00 p(£) ~ £~ Cl , 
Pret{n) ~ n~ Cr , with ci ~ c r — 1.1(1). The peaks at small 
I and n in the distributions indicate that loops at high T 
mostly occur within isolated small hairpins in model II. 
The identity of eg and c r means that almost all large 
bridges are also main bridges. Thus, the return time es- 
sentially coincides with the loop length at high T. At 
lower T, while Pi oop becomes shorter and shorter ranged 



4 



for decreasing T, the behavior of P re t remains of power 
law type at large arguments. The value of c r remains sta- 
ble and close to that estimated for e/T = 0. This means 
that as the RNA molecule enters deeper and deeper into 
the molten phase with developed secondary structures, 
the loops become shorter and shorter. On the other hand, 
the main branches departing from the contact-free back- 
bone encompass all accessible length scales, as appropri- 
ate for a branched polymer. This could also explain why 
in this range of T the exponent v discussed above is not 
far from 1/2, as for branched polymers |l9j. The expo- 
nent c r obtained here definitely deviates from the mean 
field value 3/20. 

Summarizing, for model I we could establish the ex- 
istence of a sharp denaturation transition which is due 
to the presence of pseudoknots. Unlike the melting of 
DNA, this is a second order transition. Model I applies 
to experimental situations in which the stability gap be- 
tween secondary and tertiary folding is sensibly reduced, 
e.g., by a high concentration of Mg 2+ ions 0, Hj|. On 
the other hand, one can regard model II as a more ade- 
quate description of RNA when the same stability gap is 
large, e.g. with low Mg 2+ concentrations (j,|23j. In this 
model there is no sharp transition and the denaturation 
occurs as a crossover from linear to branched-compact 
polymer behavior. The geometry of this crossover is well 
described by the distributions Pi 00 p and P re t and by their 
exponents, whose nonclassical values are a further conse- 
quence of excluded volume. 
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